Categories

Versions

Get Pages (Web Mining)

Synopsis

Gets pages from URLs in an attribute and stores them into a new attribute.

Description

This operator retrieves pages, whose URLs are contained in the input data set. For each row in the data set, the URL is extracted from the specified attribute. A GET request is sent and a page is acquired. This page is stored in a new attribute specified by the parameter page attribute.

Input

  • Example Set (Data table)

    The Example Set port.

Output

  • Example Set (Data table)

    The Example Set port.

Parameters

  • link attributeThe attribute that contains the URLs.
  • page attributeThe name of the attribute that should contain the pages.
  • random user agentChoose a user agent randomly from a set of 7000 user agents
  • user agentThe user agent property.
  • connection timeoutThe timeout (in ms) for the connection.
  • read timeoutThe timeout (in ms) for reading from the URL.
  • follow redirectsSpecifies, whether redirects should be followed.
  • accept cookiesSpecifies, whether cookies should be accepted.
  • cookie scopeSpecifies the scope of the cookies used
  • request methodSpecifies the request method.
  • delaySpecifies whether execution should not be delayed, delayed by a fixed or random amount of time.
  • delay amountThe delay amount in ms.
  • min delay amountThe minimum delay amount in ms.
  • max delay amountThe maximum delay amount in ms.